21 research outputs found

    Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty

    Get PDF
    Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals’ diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1–2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12–18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs

    Big Data Pipelines on the Computing Continuum: Tapping the Dark Data

    Get PDF
    The computing continuum enables new opportunities for managing big data pipelines concerning efficient management of heterogeneous and untrustworthy resources. We discuss the big data pipelines lifecycle on the computing continuum and its associated challenges, and we outline a future research agenda in this area.acceptedVersio

    Big Data Pipelines on the Computing Continuum: Ecosystem and Use Cases Overview

    Get PDF
    Organisations possess and continuously generate huge amounts of static and stream data, especially with the proliferation of Internet of Things technologies. Collected but unused data, i.e., Dark Data, mean loss in value creation potential. In this respect, the concept of Computing Continuum extends the traditional more centralised Cloud Computing paradigm with Fog and Edge Computing in order to ensure low latency pre-processing and filtering close to the data sources. However, there are still major challenges to be addressed, in particular related to management of various phases of Big Data processing on the Computing Continuum. In this paper, we set forth an ecosystem for Big Data pipelines in the Computing Continuum and introduce five relevant real-life example use cases in the context of the proposed ecosystem.acceptedVersio

    Microduplications of 16p11.2 are associated with schizophrenia

    Get PDF
    Recurrent microdeletions and microduplications of a 600 kb genomic region of chromosome 16p11.2 have been implicated in childhood-onset developmental disorders1-3. Here we report the strong association of 16p11.2 microduplications with schizophrenia in two large cohorts. In the primary sample, the microduplication was detected in 12/1906 (0.63%) cases and 1/3971 (0.03%) controls (P=1.2×10-5, OR=25.8). In the replication sample, the microduplication was detected in 9/2645 (0.34%) cases and 1/2420 (0.04%) controls (P=0.022, OR=8.3). For the series combined, microduplication of 16p11.2 was associated with 14.5-fold increased risk of schizophrenia (95% C.I. [3.3, 62]). A meta-analysis of multiple psychiatric disorders showed a significant association of the microduplication with schizophrenia, bipolar disorder and autism. The reciprocal microdeletion was associated only with autism and developmental disorders. Analysis of patient clinical data showed that head circumference was significantly larger in patients with the microdeletion compared with patients with the microduplication (P = 0.0007). Our results suggest that the microduplication of 16p11.2 confers substantial risk for schizophrenia and other psychiatric disorders, whereas the reciprocal microdeletion is associated with contrasting clinical features

    Microduplications of 16p11.2 are associated with schizophrenia

    Get PDF
    Recurrent microdeletions and microduplications of a 600-kb genomic region of chromosome 16p11.2 have been implicated in childhood-onset developmental disorders1,2,3. We report the association of 16p11.2 microduplications with schizophrenia in two large cohorts. The microduplication was detected in 12/1,906 (0.63%) cases and 1/3,971 (0.03%) controls (P = 1.2 × 10−5, OR = 25.8) from the initial cohort, and in 9/2,645 (0.34%) cases and 1/2,420 (0.04%) controls (P = 0.022, OR = 8.3) of the replication cohort. The 16p11.2 microduplication was associated with a 14.5-fold increased risk of schizophrenia (95% CI (3.3, 62)) in the combined sample. A meta-analysis of datasets for multiple psychiatric disorders showed a significant association of the microduplication with schizophrenia (P = 4.8 × 10−7), bipolar disorder (P = 0.017) and autism (P = 1.9 × 10−7). In contrast, the reciprocal microdeletion was associated only with autism and developmental disorders (P = 2.3 × 10−13). Head circumference was larger in patients with the microdeletion than in patients with the microduplication (P = 0.0007)

    Mouse genomic representational oligonucleotide microarray analysis: Detection of copy number variations in normal and tumor specimens

    Get PDF
    Genomic amplifications and deletions, the consequence of somatic variation, are a hallmark of human cancer. Such variation has also been observed between “normal” individuals, as well as in individuals with congenital disorders. Thus, copy number measurement is likely to be an important tool for the analysis of genetic variation, genetic disease, and cancer. We developed representational oligonucleotide microarray analysis, a high-resolution comparative genomic hybridization methodology, with this aim in mind, and reported its use in the study of humans. Here we report the development of a representational oligonucleotide microarray analysis microarray for the genomic analysis of the mouse, an important model system for many genetic diseases and cancer. This microarray was designed based on the sequence assembly MM3, and contains ≈84,000 probes randomly distributed throughout the mouse genome. We demonstrate the use of this array to identify copy number changes in mouse cancers, as well to determine copy number variation between inbred strains of mice. Because restriction endonuclease digestion of genomic DNA is an integral component of our method, differences due to polymorphisms at the restriction enzyme cleavage sites are also observed between strains, and these can be useful to follow the inheritance of loci between crosses of different strains

    DataCloud: Enabling the Big Data Pipelines on the Computing Continuum

    Get PDF
    With the recent developments of Internet of Things (IoT) and cloud-based technologies, massive amounts of data are generated by heterogeneous sources and stored through dedicated cloud solutions. Often organizations generate much more data than they are able to interpret, and current Cloud Computing technologies cannot fully meet the requirements of the Big Data processing applications and their data transfer overheads. Many data are stored for compliance purposes only but not used and turned into value, thus becoming Dark Data, which are not only an untapped value, but also pose a risk for organizations
    corecore